Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dropbox Document Loader #7301

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Ser0n-ath
Copy link

This PR introduces support for the Dropbox Document Loader.

  • Added documentation for Dropbox loader
  • Added code examples
  • Added Dropboxloader class to load documents
  • integration tests for the loader

This is our first contribution to Langchain. I'd appreciate any feedback and suggestions for this PR.

Fixes #7031

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Nov 30, 2024
Copy link

vercel bot commented Nov 30, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview Jan 2, 2025 3:56am
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ⬜️ Ignored (Inspect) Jan 2, 2025 3:56am

@dosubot dosubot bot added auto:documentation Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Nov 30, 2024
@Ser0n-ath
Copy link
Author

Hi @jacoblee93, Can you take a look at this PR? I'd appreciate your feedback.

Copy link
Collaborator

@jacoblee93 jacoblee93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay!

accessToken: "your-dropbox-access-token",
},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general", // Replace with your Unstructured API URL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's emphasize somewhere that this wraps Unstructured

Should we call this DropboxUnstructuredLoader instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can rename the loader class to DropboxUnstructuredLoader
I want to confirm if I need to rename the file to say dropbox_unstructured.ts as well?

Also, I noticed that a few preexisting loaders utilize unstructured as well. Would they need to be renamed as well in the future?:

console.log(docs[0].pageContent);
```

## Configuration Options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just link to API refs instead

Copy link
Author

@Ser0n-ath Ser0n-ath Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fine? dccce35

/**
* The file system module to use. Defaults to Node's `fs` module.
*/
fs?: typeof fsDefault;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need fs

Copy link
Author

@Ser0n-ath Ser0n-ath Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that a consumer might want to pass their own instance of fs to control where downloaded assets are stored in the interim before being sent out to Unstructured.

However by 7027578 the fs option is redundant and has been removed.


await fsPromises.mkdir(path.dirname(localFilePath), { recursive: true });

await fsPromises.writeFile(localFilePath, fileBinary, {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to not need to use the file system

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to not use the file system 7027578

@jacoblee93 jacoblee93 added the close PRs that need one or two touch-ups to be ready label Dec 24, 2024
@Ser0n-ath
Copy link
Author

Hi @jacoblee93, If you get a chance, would you mind reviewing some of the recent changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:documentation Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features close PRs that need one or two touch-ups to be ready size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Dropbox Document Loader Support to LangchainJS
4 participants